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Abstract 

The treatment effects of the same therapy observed from multiple clinical 
trials can be very different. Yet the patient characteristics accounting for the 
differences may not be identihable in real practice so that it is necessary to 
estimate and report the overall treatment effect for the general popoulation 
during the development and validation of a new therapy. The non-linear 
structure of the maximum partial likelihood estimates for the (log) hazard 
ratio defined with a Cox proportional hazard model leads to challenges in 
the statistical analyses for combining such clinical trials. In this paper, we 
formulated the expected overall treatment effects using various modeling as¬ 
sumptions. Then we proceeded to propose efficient estimates together with 
a version of Wald test for the combined hazard ratio using only aggregate 
data. Interpretation of the methods are provided in the framework of robust 
data analyses involving misspecihed models. 

Keyword: combining survival trials, misspecihed models, harmonic average. 

1 Introduction 

Multiple clinical trials may be performed to validate a newly developed ther¬ 
apy to account for the variability of the targeted patient population. The 
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data collected from each individual clinical trial may be used to generate 
the efficacy (or safety) estimates for each specihc trial as well as the overall 
efficacy estimate for the overall population to support the effectiveness of 
the therapy. In this paper, we will focus on the analyses of time-to-event 
data (t) using Cox proportional hazard models. Let Z = {Zi, ..., Z^} be 
the covariates available for each enrolled patient. Given Z, the proportional 
hazard model assumes that the hazard function for a patient from the i-th 
(i = 1,..., M) trial can be writtens as 

hi{t\Z) = hjo(t)exp(/3'Z), 

where hio{t) is the baseline hazard function of the i-th trial with unknown 
formulation, = {Pn, ■ ■ ■, Pik} is the vector of log hazard ratios defined 
specihcally for the patients enrolled in the i-th trial. 

Efficient estimates of the trial specihc /3i based on maximal partial likehood 
(MPLE) were well developed since the days of Cox (1972, 1975). In this 
paper, we are more interested in the methods for statistical inference based on 
the information contained in the pooled data from all M trials. Meta analyses 
were invented to address such needs. Yuan (2009) provided a comprehensive 
review of the statistical methods used to generate an overall (log) hazard 
ratio estimate based on multiple clinical trials. The convex combination of 
all the log hazard ratio estimates from each individual trial is a popular choice 
(Wei, Lin and Weissfeld 1989): 

M 

^Lj = Wi/3ij,Wwi + ...+Wm = 1- (1) 

i=l 

Here (3ij (j = 1,... ,k) denotes the MPLEs of (3ij derived from the patient 
data collected from the i-th trial. The weights Wi can be arbitrary constants 
sum up to unity, among which the inverse variance scheme may be the most 
popular option due to the obvious advantage of minimized variance among 
the family of all convex combinations of /dj^’s. Though less common, some 
researchers also recommend using the linear combination of the hazard ratio 
estimates for any given covariate value Zj: 

M 

hij = ^ , Vwi + ... + wm = 1- (2) 

i=l 
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To account for the differences in each [3ij due to randomness inherited from 
the observed data, DerSimonian and Laird (1986) provided an overall treat¬ 
ment effect estimate based on random effect models. Yuan continued to 
propose a meta-ANOVA model and a meta-polynomial model to address the 
differences in /3i/s due to mis-match of baseline information from different 
trials. These methods require a large number of trials (M) to support the 
regression algorithms. Here we are only interested in the cases where very 
limited number (M > 2) of trials are deliverable. 

As far as we know from the research literature, all the statistical models de¬ 
veloped for meta-analyses assume that there exists a set of “true” baseline 
hazard function hoi(-) as well as a unique “true” log hazard ratio (vector) 
(3 through out all the trials designed for the same therapy. The baseline 
hoi{-) may vary by i, while (3 has to be consistent for all the M trials. The 
differences in the estimated log hazard ratios /3i = {/5ii,... ,/3ik} are either 
attributed to the randomness of the observed outcomes or the incomplete¬ 
ness of the covariate data. However, in some cases, such assumptions may be 
far from the truth. The treatment effect on different populations may be es¬ 
sentially different due to discrepancies in some latent patient characteristics. 
The problem is even more acute if the therapy under investigation targets 
specihc gene expressions. The human genome is overwhelmingly complex 
such that even the most well devised targeted treatment can be subject to 
unexpected impacts from genes outside the targeted region. Hence the true 
values of /dds may vary with i because the patient populations enrolled for 
different trials are in fact heterogeneous with regard to their responses to 
the therapy. Even though the inclusion/exclusion criteria and the design of 
these trials may appear to be perfectly aligned, it cannot help to suppress 
the differences in the patients recruited for different trials. It is impossible to 
adjust the hazard ratio estimates for these latent controlling factors because 
they are usually unknown to the researchers or not able to be detected by any 
currently available technology/assay. Instead, the researcher has to compro¬ 
mise with an overall treatment effect estimate derived from the pooled data, 
or from the summary statistics reported for each trial if patient line data are 
not available. 

Mehrotra et. al. (2012) investigated the effect of deviation from the con¬ 
stant hazard ratio assumption underlying the standard stratihed survival 
analyses. Our method proposed in this paper can be used to establish an 
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efficient overall treatment effect estimate covering multiple stratums of high 
variability. Denne et. al. (2014) and Li (2014) provided another good exam¬ 
ple illuminating the necessity of considering different treatment effects {f3i) 
for different trials. One version of an assay may be used to enroll patients by 
a biomarker for a validation trial. Another version of the assay targeting the 
same biomarker may be needed when the treatment is ready for marketing 
due to advances of technology or solely to reduce the cost and time for pa¬ 
tient screening. The two versions of the assay may not perfectly match with 
each other. The discrepancies in the test results may reflect some differences 
in the patient’s biological prohle which are not intended to be captured by 
either assay. Such differences may affect the patient responses to the ther¬ 
apy. Hence the intent-to-treat patient population is divided into multiple 
subgroups with different assay result combinations. It is reasonable to as¬ 
sume that the patient responses to the same targeted treatment are in fact 
different (or even opposite as was observed in some real life examples) across 
these subgroups dehned by both assays (Sargent et. al. 2005). However, only 
the market-ready version of the assays will be available to the patients/client 
laboratories so that it is the overall treatment effect covering all subgroups, 
rather than the treatment effect on patients with specihc test result combi¬ 
nations, that concerns the developer. 

The linear estimate dehned by ([^ or 6^ by ([^ was used as the estimator 
for the overall treatment effect in these mentioned publications. The linear 
estimates has its own merit. It is easy to be implemented for calculation 
and the derivation of its variance is straightforward. It is guaranteed to be 
close to the correct answer if the true values across different trials only 
differ by a small amount of no practical signihcance. However, if the /Sj’s are 
very different, despite the choice of the weights (tCi), the usage of a linear 
estimate is not mathematically justihable because each component of the 
linear combination converges to a completely different I3ij. It is not very 
likely that the limit of the estimator {^Wij3ij) is an exact measurement of 
the overall treatment effect because the (log) hazard ratio dehned by the Cox 
proportional hazard model is obviously nonlinear with regard to the observed 
survival times. As a matter of fact, we will show that in most cases (and 
bi) dehned by the inverse variances coefficients or by the proportion of trials 
sizes leads to over estimate of the true overall hazard with the treatment. 
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The major concerns and challenges for a pharmaceutical researcher facing 
data collected from multiple trials can be summarized by two questions. 
First, what is the proper dehnition of the overall treatment effect? Hazard 
ratio is very different from the other commonly used endpoints such as the 
mean/median survival time and response rate. The later resorts to a natural 
measurement (in most cases, counting) of observed events or time, while haz¬ 
ard ratio is an artihcial concept specihcally invented for the Cox proportional 
hazard model. The definition of a “hazard ratio for the overall population” 
is ambiguous now that the Cox model is no more applicable to a mixed pop¬ 
ulation. In the ideal case one may have all the exact knowledge about the 
true underlying performance of the therapy including but not limited to the 
shapes of the baseline hazards hjo(t) and the values of the log hazard ratios 
/3i. That being said, the overall treatment effect is not a readily dehned value 
as a unique functional of all such baseline functions and parameters. For the 
hrst time in the literature, we point out that the problem has to be addressed 
in the misspecihed model framework. Typically the target to be estimated 
with a misspecihed model is the limit of a chosen estimator. Its dehnition 
has to depend on the modelling assumptions chosen by the researchers. We 
will demonstrate two kinds of such dehnitions in this paper and provide ex- 
plaination for their very delicate diherences. The second question naturally 
comes after the hrst one, i.e., how to generate a statistically efficient esti¬ 
mate for the overall drug efficacy after a proper dehnition? It is even more 
challenging if patient line data are not available but only aggregate statistics 
can be accessed for some of the trials. 


2 Combined hazard ratio as a limit of the 
MPLE from pooled data 

It is sufficient to consider only two independent trials. Generalization of the 
results to cover more trials is straightforward. Assume that n patients were 
enrolled for the hrst clinical trial. The survival times and baseline demo¬ 
graphics of the patients are denoted X = {(Xj, 6i, Zi),i = 1,..., n}. Here Xi 
is the right-censored survival time of the i-th patient, 6i = 0/1 indicates that 
Xi is censored or an observed event time, Zj is a /c-dimensional covariate with 
probability density function fz{zi,..., Zk). In many cases the distribution /z 
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may vary by the trials. For simplicity, we assume the same distribution fz 
for all the trials in this paper. Extension of our methods to accommodate 
different distributions for the covariates is obvious. Assuming independent 
censoring, the patients’ survival times are i.i.d.’s with proportional hazard 

hx{x\Z) = /ixo( 2 ^) exp(Q;'Z). 

By dehnition, the pdf of an uncensored Xi (i.e., 6i = 1) conditioned on Z, is 

hxo{x) 

where hxo{') is an arbitrary baseline hazard function of the hrst trial and 
Hxo{.x) = hxo(u)du is the corresponding culmulative hazard. Similar 
notations can be dehned for the second trial. Let Y = {(Vj, 6j, Zj), j = 
1,..., m} be the data collected from the second trial. Assuming proportional 
hazard and independent censoring, the patients from the second trial has 
hazards 

hyivlZ) = hyoiv) exp{/3'Z). 

The trial-specihc treatment effect estimates {a and /3) and their asymptotic 
variance-covariance matrices can be easily estimated using the MPLEs cal¬ 
culated from the X and Y data respectively. We are particularly interested 
in the setup where the underlying true values oi a ^ (d. Without loss of gen¬ 
erality, let the first component of the covariates {Zn) be the arm indicator 
for the i-th patient. The patient is in the treatment arm if Zn = 1 or he is 
in the control arm if Zn = 0. The value of ai and /3i, the hrst component 
of the regression parameter a and (5 respectively, are of most concern as a 
measurement of the treatment effect. It is convenient to assume ai < Pi for 
all the discussions presented in this paper. 

It is natural to consider the pooled patient line data 

{{Wi,5i,Zi),i = l,...,n-f m} 

{(A^j, (5j, Zj), i 1,..., n} U {(Yj,6j, Zj ),j l,...,?7i}. 

Most researchers consider the MPLE calculated from all n -f m (denoted by 
N) pooled patient line {PL) records not only as an appropriate estimate for 
the overall treatment effects bnt also the best (especially when compared to 
the linear estimates Pl or hi) answer to onr hrst question. The only obvious 


6 



drawback of the MPLE is that it requires the knowledge of all N patient line 
data: 


n+m 


OpL = arg max 


2=1 


exp(6''Zj 


EjeSR. exp(0'Z, 


(3) 


where 3?* is the set of labels for those patients (originally from X or Y) who 
are at risk at time Wi—. 


The overall log hazard ratio 6 is hence dehned as the limit oi 9 pl when both 
n and m —)■ cx). It is worth to point out that one needs to first determine 
what is an appropriate statistic for the overall treatment effect based on both 
statstical and clinical thinking. Then the parameter to be estimated follows 
as the limit of the statistic, not vice versa. A different version of the overall 
treatment effect can be as valid based on other assumptions about the esti¬ 
mating procednre. We will extend the discussion to provide snch an example 
in the Section 4. 


It is eqnivalent to imposing a misspecihed Cox model (working model) on 
the pooled data snch that the combined hazard can be written as 

hw{w[L) = hvi/o(w) exp(0'Z). (4) 

Assnme that the two trials have the same baseline hazard hxo{t) = hyoit) = 
ho(t) for alH > 0 because the control arms are usnally subject to standard 
of care. Snch treatments are well established for the general population. 
The true pdf of the pooled data W is a mixture of two proportional hazard 
models 


where n/(n -|- m) —)■ p as n,m —)■ oo is a hxed ratio of the sample sizes 
controlled by the researcher. The true model for Wi does not satisfy the 
proportional hazard assnmption: 


h{w\Z) 


hoiw) 


p^a'Z^-exp{a'Z)Ho{w) _j_ exp(/3'Z)//o(«') 

p^-exp{a'Z)Hoiw) ^ exp(/3'Z)Ho(«') 


The formnlation of the limit of Opp can be stndied using the techniqnes 
developed for misspecihed models in Strnthers and Kalbheisch (1986) and Lin 
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and Wei (1989). Let hi{t) be the true hazard function of the i-th patient from 
the pooled dataset W and Ri{t) = be the at-risk process at arbitrary- 

time t > 0, z = 1,... ,n + m. It is convenient to dehne the notations following 
the convention of Andersen and Gill (1989), Struthers and Kalbfleisch (1986) 
and Lin and Wei (1989): 


= E 


=E 


N 


Ri{t)hi{t) 

i=l 
N 


z* ,sb)(0,t) = E 


N 


z=l 


yyRi{t)hi{t)7Ai 

i=i 

N 


2 = 1 


Here the expected values are dehned with respect to the true distribution of 
Wj and Z,-. 


Proposition 1. (based on Theorem 2.1 of Lin and Wei (1989)) LetOpi be 
the MPLE of the log hazard ratios for the overall treatment effect as defined 
in Q). When n,m ^ oo, §pl converges in probability to the unigue solution 
of the following eguation: 

- r = 0 . ( 5 ) 

Jo Jo s^^fi9,t) 

□ 

Without censoring it follows the dehnition of Ri{t) and the hazard hfit) that 
E[Rfit)hi{t)\Z,] = P[W, > t\Zi]R{t). 

The right hand side of the above formula is simply the pdf of Wt by the dehni¬ 
tion of the hazard hfi-). It can be written as fx(t\Z) or /y(t|Z) respectively, 
in the form of 

ho{t) exp(d'Z,)e-""P(“'^bJ^o(b or exp(;g'Zi)e-""P(^'^bHo(b 

depending on whether the z-th patient is from the hrst or the second trial. 
Hence s^^\t) and s*'^^(t) can be simplihed by the following calculation 

E[Rfit)hfit)] = E[E[Rfit)hi{t)\Z,]] 

= Ez[fx{t\Z)] if Wi e X, Ez[fY{t\Z)] otherwise. 











Similarly t) and t) can be simplified using i?[i?j(t)|Zj] = P\Wi > 

t|Zi] in the form of 


g-rro(t)exp(o'ZP g-Ho(i)exp(/3'Zi) 


Expand the shorthand notations defined for Proposition 1, we have 

= E[n/x(t|Z)+m/y(t|Z)], 

= E[n/x(t|Z)Z + m/y(t|Z)Z], 

s^^\e,t) = E[nP{X>t)e^'^+ mP{Y >t)e^'% 

(0, t) = E[nP{X > t)e^'^Z + mPiY > t)e^"^Z]. 


All the above notations turn out to depend on no random variables other 
than the covariate Zj’s. The subscripts for Z are suppressed with the as¬ 
sumption that the Zj’s are independent and identically distributed. Under 
mild smoothness conditions, the first term of equation (|^ can be simplified 
by switching the order of the integrals: 


s^^\t)dt = E 


n / fx{t\Z)dtZ + m / fY{t\Z)dtZ 


= NE{Z). 


Plug in (|^ with the definitions of the pdf’s and survival functions. For suf¬ 
ficiently large n and m, substitute n/{n + m) by p, i.e., the fixed ratio of the 
study sizes. Eliminate iPo(^) by letting u = iPo(^) and hence du = ho(t)dt. 
We derived an equation for the definition of the overall treatment effect. 


Corollary 1.1 Assume no censoring in the data X and Y. The true treat¬ 
ment effect (log hazard ratios) from either trials are known and denoted by 
a and (3 respectively. The overall log hazard ratio 6*p^ defined as the limit of 
OpL with n, m —)■ cxD is the unigue solution to the following eguation: 


E{Z) = 


oo pE ) + {l-p)E 


'o pE {e^"2‘-e^v{a'7.)u'^ + (1 “ p)E 

pE /^e“'Z-exp(a'Z)n^ _|_ |^g/3'Z-exp(/3'Z)« 


( 6 ) 


du. 


□ 
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The distribution of the covariates Z can be well approximated using data 
from the general intent-to-treat population. Once the distribution of Z is 
known, equation ([^ can be solved numerically. 

Usually one needs all N patient line data to dehne the MPLE 6pl as well as 
its sandwich-type robust variance estimate. Nonetheless, if only aggregate 
data such as the MPLEs a and f3 from individual trials are available for 
some reason, one may solve equation (|^ with a and /3 substituted by their 
estimates a and jS. The solution to such an equation, which is henceforth 
denoted by 9m (the subscript M stands for “misspecified model”), does not 
rely on the knowledge of the baseline hazard ho(-). It is a semiparametric, 
asymptotically efficient estimate to 9*pj^ because the MPLEs a and /3 are 
based on maximum likelihoods. Such procedures are well known for being 
invariant with regard to functional transformations. 

The condition of no censoring in Corollary 1.1 is natural for the dehnition 
of because we are only interested in the performance of the therapy. 
Censoring is considered as noise imposed on the observed survival times and 
should be excluded from the estimating procedure if at all possible. The 
MPLEs a and {3 reported for the individual trials are (asymtotically) unbi¬ 
ased for the underlying log hazard ratios a and {3 even if the data X and 
Y are censored. Therefore 9m always remains an unbiased estimate for the 
overall treatment effect 9*pj^ no matter the data are censored or not. 

Example 1. Usually the effects of the covariates are assumed to be sorted 
out by proper randomization. One of the most important analysis used in 
practice use the treatment arm indicator Z = 0/1 as the only covariate 
with q = P{Z = 1). Denote the hazard ratio of the first trial by a = e“, 
the hazard ratio of the second trial by 6 = e^. By corollary 1.1, the MPLE 
cpL = Exp(9pl) calculated from the uncensored pooled line data of N patient 
records converges to the solution of the following equation about c: 

^ (^1 _ g)e““ -I- + (1 — p)qbe~'^^ 

Jo (1 ~ q')e““ -I- pqce~°''^ + (1 ~ p)qce~^^ 

+ (1 — p)ce“^“] du. 

Equation ([^ can be simplihed after a series of basic algebraic transforma- 
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tions: 


'OO 


( 8 ) 


1 = 


(1 — q)e “ + pqae + (1 — p)qbe 


• e '^du 


JO (1 “ g)e““ + pqce~°‘^ + (1 “ p)qce~’^ 

It is easy to see that the right-hand-side of ([^ is a strictly decreasing, convex 
fnnction of c. It implies that the overall hazard ratio dehned by the soln- 
tion Cp^ to eqnation mnst take a valne on the open interval (a, 6). In 


Appendix 5.2, we will also show that in most cases Cp^ is snperior to (i.e., 
smaller than) the linear alternative exp [pa -|- (1 — p)/3] and pa + [1 — p)h. □ 


Now take a closer look at Op^. It is conventionally considered as the best 
estimate to the overall log hazard ratio of the combined trials. However, 6*pj^ 
is actnally dehned to be the limit of the nncensored Opp. The estimator Opp 
is snbject to the impact of censoring and can be biased for 9*pp. By (|^, the 
valne of 9*pp only relies on d, /? and the distribntions of Z. 


Corollary 1.2 Assume independent right censoring for both X and Y such 
that the survival function of the censoring times are denoted Cx{t\Z) and 
Cy(f|Z) respectively. The limit of9pp with n,m ^ oo is the unique solution 
to the following equation with respect to 9 with known values of a and (3: 

E\pfx{t\Z)Cxm)Z + (1 - p)fY{t\Z)Cy{t\Z)Z]dt = (9) 



/•OO pE + {l-p)E 

0 pE (e®'z-*^>^p(“'z)-f^oWC'x(f|Z)) + {l-p)E {e^"^-^^pCh'mo{t)Cy{t\Z)) 

[pE {fx{t\Z)Cx{t\Z)) + (1 - p)E {fY{t\Z)Cy{t\Z))] dt 


□ 

With a mixed population, censoring contains information about the source of 
the data, which is correlated with the length of the subject’s expected survival 
time beyond censoring. The limit oi 9pp must contain all such information 
as the censoring mechanism Cx{t\Z), C'y(tlZ) and the baseline hazard hfo(^)- 

Example 2. Assume the same setup as in Example 1. The survival times 
observed from either trials follow Cox models dehned with a treatment arm 
indicator Z ~ Bernoulli{q). Both clinical trials will be terminated at given 
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time Tjnax > 0. Hence the censoring time has a point mass of unity at T^ax- 
That is, Cx{t\7i) = Cy(t|Z) = 1 if t < Tmax and 0 otherwise. Expand 
the dehnitions of the distribution functions and use the change-of-variable 
technique by letting u = HQ^t) in equation (|^, it becomes 


l'Ho{Tmax) 


3 -Ho(r_)_ / . (1 - g)e " + pgae + {1 - p)qbe 

( 10 ) 

The limit of the MPLE Opi calculated from the censored line data is the 
solution c* to the above equation. To study the bias in Opp, we need to 
compare c* against Cp^, which is the solution to equation ® . For simplicity, 


(1 — q)e “ + pace + (1 — p)qce 


denote the integrand in equation (10) by g{u\c) for hxed a and b. Equation 
(|^ asserts that g{u\c*pp) is a well dehned pdf because its integral on (0, cx)) 
equals unity. Consider the fact that c*pp < pa + {1 — p)b (Appendix 5.2). It 
is easy to prove that the fraction term 


(1 — q)e “ + pqae + (1 — p)qbe ^ 

(1 — g)e““ + pacppe“““ + (1 — p)gc*PL^~’^^ 

ifO < M < [ln(l—p)(cpp -b)/p/{a —c*pl)\I{ b—a) and it is less than 1 otherwise. 
Hence the pdf g{u\c*pp) only intersects a standard Exponential pdf 6““ at 
one point, which in turn implies that the distribution dehned by g{u\c*pi) 
is stochastically smaller than a standard Exponential distribution (Fill and 
Machida 2001). Therefore, the cdf corresponding to g{u\c*pi) is always bigger 
than that of the standard Exponential distribution e~^du = 1 — e~^ for 
any given H: 




< 


rHoiTmax) U _|_ (X — p)qbe 

) (1 — q')e““ + pac*ppe~°''^ + (1 ~ p)?Cppe“^“ 


■e ^du. 


Note that g{u\c) is monotonically decreasing with respect to c. To make an 
equality as in equation (10), its solution c* must be greater than Cpp. When 


the survival time data are censored at a maximum allowable trial length 
Tmax, OpL is always associated with a positive bias. The bias decreases with 


T □ 

-‘-max' '—I 


Corollary 1.2 indicates that the most accepted MPLE Opp is not robust 
against variability in the treatment effects observed from multiple trials. 
Hence we recommend reporting the overall log hazard ratio for multiple trials 


12 







using 9m rather than Opi- The former is not only robust (unbiased despite 
of censoring) but also provides better chance for the researchers because it 
only requires aggregate statistics from each sub-population of concern. The 
following example illustrates the impact of censoring on 9pi using simulated 
data. 


Example 3. We performed 1000 rounds of independent simulations to mimic 
the following scenario: the survival times of 200 patients treated in the hrst 
trial follow an Ea:p(0.3) distribution. With 1:1 randomization (i.e., q = 0.5), 
another 200 patients in the control group has survival times sampled from a 
standard exponential distribution Exp{l). Let p = 0.7, the second trial en¬ 
rolls 85 patients for the treated group and 85 patients for the control group. 
To mimic a lower drug efficacy, the survival times of the treated patients 
from the second trial are sampled from an Exp{0.8) distribution, while the 
survival times of the control group patients are sampled from Exp{l). 

Without censoring, we can calculate Opp for each of the 1000 simulated data 
set and summarize the distribution of using its empirical distribution. In 
this example, it was reported that E(6pl) = —0.926 (equivalent to ln(0.396)) 
with a 95% conhdence interval of (—1.088, —0.756). According to Corollary 
1.1 this is an (asymtotically) unbiased estimate for the true overall log hazard 
ratio 9*pj^ as there is no censoring. 


We continued to censor the 1000 simulated data sets using various censoring 
time Tmax £ [1,10] and tried to calculate the estimates 9m and 9pl respec¬ 


tively for eac 
to equation (8 


i^specihc Tmax- In Figure l(i) are reported as the solution 
with a and (3 substibuted by a and /3, which are calculated 
from the censored trial 1 and trial 2 data respectively On the other hand, in 
Figure l(ii) the MPLE 0 pl’s are calculated using (^ with all 570 censored 
line data from either trials pooled together. 


The gray lines in Figure 1 outline the true overall hazard ratio (-0.926) and 
its 95% conhdence interval (—1.088, —0.756). The bias of 9pl can be easily 
observed in Figure l(ii). When the trials are censored at T^ax = 1, about 51% 
of the collected data are censored. The MPLE 9pl reported for this scenario 
has a mean of -0.854 and a 95% conhdence intervals of (—1.074, —0.601). It 
accounts for about 7.8% of positive bias in the log hazard ratio estimates. 
Such a result can be of serious concern for those clinical trials expected to 
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% of censoring % of censoring 

29 6 1 29 6 1 




Figure 1: (i) Mean and the 95% Cl for 9m- (h) Mean and the 95% Cl for 

OpL- 

be associated with low event rates, e.g., a trial for breast cancer treatments. 
For the same censored dataset, 9m displayed in Figure l(i) appears to be 
unbiased for 9*pi^. The conhdence interval of 9m tends to be wider with more 
data being censored because the variance of the log hazard ratio estimate 
is proportional to the inverse of the number of observed events (Kalbfleisch 
and Prentice (2002)). □ 

3 Combined hazard ratio via harmonic means 

We noted that 9pl and 9m are semi-parametric estimators without requiring 
any knowlege about the baseline. However, the baseline hazard hwo{') of the 
misspecihed model (|^ is different from the baseline hazard ho(-) dehned for 
each individual clinical trial. It is due to the fact that the MPLE procedure 
is the result of simultaneous maximization of the unknown parameter 9 and 
the discretized baseline hazard hwo{') in the form of a Breslow hazard esti¬ 
mate (Breslow (1972), Johansen (1983)). The shape of the baseline hazard 
ho(-) is twisted to £t the mixed event times and in return leads to a slightly 
under-estimated therapy effect, or equivalently an over-estimated log hazard 
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ratio 6*p^ in many cases. We will propose another method to avoid snch 
undesirable effect. 


Example 4. Let Xi,..., X„ ~ Exp{a) and X„+i,..., X 2 n ~ Exp{l) be the 
i.i.d. event times observed from the treatment and control groups of the first 
trial, W,..., Wi ~ Exp{b) and Wj+i, ..., Y 2 m Exp{l) be the event times 
recorded from the treatment and control groups of the second trial. Without 
loss of generality, assume a < 6 < 1. No censoring is allowed for simplicity. 
Provided the complete patient line data, it is easy to estimate the treatment 
effects of the therapy in either clinical trial using the MLEs a = n/ 
and b = rn/Yl^=i^j- Assuming a misspecified Exp{c) model for the pooled 
data, it is natural to estimate the overall treatment effect using 

^ n + m 1 

n/{n + m)a + m/{n + m)b 


p/a + (1 - p)/b’ 

Again, the overall hazard ratio can be defined as the limit of the chosen 
estimate c. It turns out to be the harmonic mean of the individual trial 
effects a and b weighted by the ratio of the study sizes p : {1 — p). As a 
matter of fact, the harmonic mean effect is smaller than that defined by the 
semiparametric MPLE method in most cases (Appendix 5.2): 


p/a + {l-p)/b 


if a < 6 < 1, 


where c*pp is the solution of equation (|^ as well as the limit of the MPLE 
exp(6'pL) calculated using {Wi ,..., W 2 n+ 2 m} = {Ai..., X 2 n, W,..., Y 2 m} 
without assuming an underlying exponential hazard. The difference between 
the two versions of the overall treatment effect estimate can be attributed to 
a twisted baseline hazard hwo{-) approximate due to the MPLE procedure. 
Consider the two trials respectively, the Breslow hazard estimates 


E£=i Rxe{Xi)a^^ 


or 


E 


ES RYi{Y,)b^^ 


calculated from either sets of patient line data are both asymptotically un¬ 
biased for the true underlying linear culmulative hazard function Ho{t) = t. 
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Here the only covariate = 0 if the patient is in the control group and 1 
if the patient is in the treatment group. The numerator of the fraction is 
always equal to one because there are no censoring. However, the Breslow 
estimate calculated from the pooled data hhj’s does not lead to a constant 
hazard estimate over time. At arbitrary time t, the non-parametric baseline 
hazard hwo{t) as a limit of the Breslow estimate is 


hwoit) 


pe + (1 — p)e + e * 

+ (1 — p)e“^*]cp^ + e~* 


( 11 ) 


The derivation of the above formula is provided in Appendix 5.1. Note that 


, pa+{l-p)b + l a 

hwo{0) = -;——,-and hwofoo) = 


■'PL 


+ 1 


-'PL 


Unless a = b, the shape of is tilted to the right by the fact that 

pa + (1 — p)b > c*pj^ and a < c*pp (Appendix 5.2). For sufficiently large t, 
hwoit) is monotonically decreasing and hwoit) = 1 happens at only one time 
point t. The trend in hwo{t) is consistent with the monotone changes in 
the mixture proportion of the complete population. At the very beginning, 
the estimated baseline hazard is bigger than one. The proportion of treated 
patients from the first trial increases with time because the patients from the 
second trial have shorter expected life (a < b) and finally the baseline hazard 
estimate is dominated by a, the effect of the first trial, and scaled by 1/cp^, 
such that the average baseline hazard is close to one. Hence a smaller com¬ 
bined effect (i.e., bigger hazard ratio c*pp) is given by the MPLE procedure 
compared to the harmonic mean effect based on the parametric Exponential 
model. In a sense, the MPLE procedure is an overfit to the data if the re¬ 
searcher is confident about the fact that the control groups from either trial 
are not essentially different. □ 


It is curious to see that the harmonic mean type of definition for the com¬ 
bined trial effect can be extended to address statistical models assuming 
much more general conditions where neither the underlying exponential dis¬ 
tribution nor the univariate covariate structure is needed. Let X denote the 
observed survival times of n patients recruited for a clinical trial. Their 
uncensored survival times follow a proportional hazard model with haz¬ 
ard hxit\7i) = /io(t) exp(Q;'Z). Similarly, the survival times (Y) of another 
m = nil —p)/p patients recruited for a second trial also follow a proportional 
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hazard model with the same baseline hazard ho(t) and a log hazard ratio (3. 
Here d is a fc-variate vector of the same structure as f3 but of different values. 
The formulation of the baseline hazard ho{t) is unknown. The MPLE is not 
appropriate for the estimation of the log hazard ratio {6) for the combined 
population if one needs to avoid a twisted baseline hazard. Instead, we set out 
to dehne the MLE for 6. It leads to a different version of a semi-parametric 
estimate {Ohm, where the subscript HM stands for “harmonic mean”) for 
the overall log hazard ratio. It is based on aggregate statistics only. Patient 
line data are not required for the realization of Ohm- 


Proposition 2. Let Omle be the maximum likelihood estimate (MLE) for 
the overall log hazard ratio 0 when a proportional hazard model with the 
following pdf is fitted to the pooled N = n + m surival time records: 

f{t]0\Z) = > 0. (12) 

When n,m ^ oo, Omle converges in probability to a constant which is 
the unigue solution to the following eguation with respect to 0: 


E{Z) = E 


e^'^Z 


P 

e&'z 


+ 


1 

e/3'z J 


(13) 


Proof. Again, we assume no censoring when trying to dehne the “true” values 
of the overall treatment effect because no information other than the mea¬ 
surements of the treatment effect should be of concern. Such conditions can 
be loosen when we get to the discussions about the estimating procedures 
for the treatment effects established here. 


Assuming the misspecihed proportional hazard model (12), the joint pdf of 
the N observed survival times is 


N 




n^o(T*)e'^'' 

i=l 

The derivative (w.r.t. 0) of the logarithm of the misspecihed joint pdf is 

N 

V/(Ti,..., T^, Zi,..., Z^) = Z, - e^"^^Z,Ho{Tf). 


2 = 1 
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By the law of large number, 




En 




ZHo{T) 


where E„ 


stands for the expectation dehned in terms of the true mixed 


model with pdf fmix- 

fniix{t,z) = + (1 - p)ho{t)e^'''e~^^ 

This is a typical setup of an estimating equation. Under mild regularity 
conditions, it can be proved that the MLE 9 defined by the solution to the 
equation = 0 converges in probability to a. 9 hm, which is the solution to 


En 


J'Z 


ZHo{T) 


= 0 . 


(14) 


Detail discussions about the asymptotics of the MLE derived from misspec- 
ihed models are available in White (1982). Note that (14) is equivalent to 

Emix ( ^ ) Enx 


J'Z 


ZHoiT) 


= Ez 
= Ez 




mix 

e'z 


,0'Z 


ZHo{T)\Z 


ZE^UHoiT)\Z) 


(15) 


dt 


To simplify the right hand side of (15), we calculate 
E„,,,[Ho{T)\Z] 

poo 

= / Ho{t)fmixit\'Zi)dt 

Jo 

poo 

Jo 

poo 

= u [pae-““ + (1 - p)be-’’^] du 

Jo 

= p/a + {l-p)/b. 

Note that hQ{t)dt = dH^it). The third line of the above equation was due 
to the following dehnition of notations: iLo(^) = u, = a and = b. 
Equation (13) follows (15) with the dehnition of Emix[HoiT)\Z] plugged in. □ 
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Following the discussions about misspecified models in White (1982) and 
Akaike (1973), it can be seen that equation (14) actually dehnes 6*^^^ as the 
maximizer of the following expectation: 


ain(/(T;»|Z)/z(Z))]. 


Hence, has an obvious geometric interpretation. It minimizes the Kullback- 
Leibler distance between the candidate working models /(f; 6*|Z) and the true 
model fynixiAT,)-. 

^HM = arg min 9)] 

9 

= £ngmmE^i^[ln{fmix{T,Z))] - Ernix[Mf{T,6,Z))]. 

9 

Now we have another version of the dehnition for the overall log hazard ra- 
tio It can be slightly smaller than 9pj^ in most cases (Appendix 5.2). 

Both of these two numbers are valid measurements of the treatment effect 
on the combined population, though they are based on different modelling 
assumptions. We consider 9pj^ a number closely related to the harmonic 
mean. Such an idea can be illustated by the following example. 


Example 5. Assume the same setup as in Example 1. The treatment 
group indicator Z ~ Bernoulli{q) is the only covariate collected for the 
enrolled patients. The observed survival times (with or without censoring) 
follow proportional hazard models with hazard ratios a = e“ and b = 


respectively for trial 1 and 2. By equation (13), the overall hazard ratio ^*HM 
is the solution to the following equation: 


q = E{Z) = E 




p l-p 
= qc\- + —— 
a b 


^HM — 


p/a+ {1 - p)/b' 


□ 


( 16 ) 


Note that exponential survival times are not required in Example 5. The 
harmonic type of calculation (16) is applicable to any general proportional 
hazard modelling setup. 
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Substitute a and /? by the corresponding MPLEs a and (3 in equation (13). 
Solve the estimating equation about 6 and denote the solution by Ohm- It is 
an asymtotically efficient estimate for because both a and /3 are based 
on maximum likelihood principles and hence are invariant to any functional 
transformations. The asymptotic variance of Ohm can be derived using the 
delta method. Consider Ohm as an implicit function of a and /3 by the es¬ 
timating equation (13). Let djO = {dOi/daj,... ,d9k/daj), Vj = 
Assuming mild regularity conditions, e.g., dominated convergence for the 


variables dehned in (13), one may switch the order of integrations and dif¬ 


ferentiations. The values of djO can be calculated by solving the following 
linear system about djO: 


E 




P 1 -P 
J'Z 


:,a'Z 


Z'dj9 


= E 


O'Zr 


■ PZj 

‘ cOi'Z 


Similarly it is easy to derive the formulas for {d9i/dl3j,... ,d9k/d/3j), Vj = 
1,... ,k. Usually the variance-covariance matrices of a and (3, denoted by 
Var{a and Var(/3), are reported together with the point estimate values. By 
the delta method, the asymtotic variance of Ohm can be calculated: 


/ \ , ... d0,/da, 

V dHt/dai ... aotiapt J I I V 

A wald test can be developed for the values of 9*jjj^ with its variance-covariance 
matrix calculated as above. 


Example 5. (cont.) When there is only one Bernoulli covariate Z as was 
specihed in Example 5, the variance of the log hazard ratio estimate 

Ohm = hi-^ 

p/ exp(Q;) + {1- p)/ exp{/3) 

can be calculated using the delta method: 


Var{9HM) 


p2e ‘^°‘Var{a) + {1 — p^e ‘^^Var{/3) 
(pe““ + (1 — p)e~^y 


When a = f3, the above formula degenerages to VarijOnM) ~ p^Var{a) + {l — 
pyVar{(3), implying that Ohm is equivalent to 0^ = pa + {l—p)/3 in this case. 
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A Wald test for the overall treatment effect Hq : 6*^j^ = 0 can be defined 
using the test statistic 9/\JVaripHu)- D 

4 Conclusion 

In this paper we investigated various methods for the estimation of the overall 
treatment effect observed from a mixed patient population. Linear estima¬ 
tors in the form of 6^ = Yhi or cl = Xli have been the favorite 
of many researchers for their simplicity. However, it is not mathematically 
justihable to approximate the notoriously nonlinear hazard ratio using any 
linear estimators if the patient responses to the treatment are highly diver- 
sihed in various sub-groups of the intent-to-treat population. In particular, 
we showed that cl > exp(0i) and both of them are, in most cases, positively 
biased for the hazard of the combined treated patients. 

We propose that an appropriate dehnition of the overall treatment effect for 
a mixed population should be first of all based on an estimating procedure 
that is justihable from either a clinical or statistical perspective. The hrst 
candidate meets such criterion is the MPLE dpi calculated from the pooled 
patient line data. It converges to a well-dehned overall log hazard ratio 9*pp 
for the combined trials if the observed event times are not censored. How¬ 
ever, 6PL is biased if the data are censored as in most of the real life examples. 

The MPLE Opp has a robust version 6m dehned with a misspecihed pro¬ 
portional hazard model. It is an asymptotically efficient semi-parametric 
estimator calculated from the aggregate statistics a and jS. It converges to 
6*pp with increasing sample sizes despite censoring in the data. We noted that 
the baseline hazard function is twisted when applying the MPLE procedure 
to the observed survival times. To avoid tampering the shape of the non- 
parametric baseline hazard, we proposed a harmonic mean type of estimator 
Ohm- Again, it is a semi-parametric estimator based on a and 13 only. It con¬ 
verges to a 9*fjM, which minimizes the Kullback-Leibler distance between the 
true mixed proportional hazard model and the misspecihed working model. 
The variance-covariance matrix of Ohm can be calculated using the delta 
method and the reported Var{a) and Var{l3). We also derived a Wald test 
for the values of 0*^j^. 
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5 Appendix 

5.1 Example 4 (cont.) 

In Example 4 of section 3, we noted that the MPLE procedure together with 
the Breslow estimate can twist the shape of the non-parametric baseline haz¬ 
ard function. Details of the calculations are provided here. 


Assume the same Exponential setup delineated in Example 4. At arbitrary 
time t > 0, let tc be the event time (it can be observed from either trial) 
right before t and be the next event time. The Breslow hazard estimate 
for the inhnitestimal time interval {w,w^) is 


|n.v(*) + + (n.Vc(«) + mYc(t)) ’ 


(17) 


where nx(t) and myit) are the number of treated patients still at risk up to 
time t, nxc and niYc are the number of living control group patients. The 
formulation of dAo(t) is discussed in many research papers and texts, e.g., 
Breslow (1972), Kalbfleisch and Prentice (2002) and Hanley (2008), though 
the meaning of the formulas remains unclear to many statisticians. It is 
corresponding to a discretized Poisson process with constant hazard between 
consecutive event times. All history up to time w can be ignored since 
the hazard is dehned as a conditional probability for the future beyond w. 
Consider a control group patient being alive at time w. The estimate dA^it) is 
in fact an empirical approximate for the probability of observing this control 
group patient die within the time interval [w,w+): dAo(t) = P[Xc < (tc+ — 
tc)], where Xc ~ Exp{Xo) is the event time of the imaginary control group 
patient with a to-be-estimated intensity parameter Aq specihcally dehned 
for the time interval [w,Wy). This is consistent with the dehnition of the 
cumulative hazard: 



So{w+) _ So{w+) - So{w) 
So{w) So{w) 


P[Xc < {w+ - w)]. 


where 5*0(■) denotes the baseline survival function. The length of the time 
interval (tc+ — w) is also exponentially distributed, according to the true 
underlying distribution, with an intensity of 


nx(t)a + my(t)6 -h {nxdt) -h niYcit)) 
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because each of the nxif) living treated patients from trial 1, the myit) 
treated patients from trial 2 and the {nxc{t) + 'i^Yc{i)) control group patients 
can be considered as a competing risk. Use the joint pdf of two independent 
exponential random variables {X^ vs. — w) to calculate that 

dAo(t) = P[Xc < (tc+ — w)] = - -T--- 

Ao + nx{t)a + my(f)6 + {nxc{t) + mycit)) 

by pv! _1_ 

[nx{t) + my{t)]cpL + {nxc{t) + mydt))' 

Solve the equation for Aq. It follows 

^ _ nx{t)a + my{t)b + {nxcjt) + mycjt)) 

° [nx{t) + my{t)]cpL + {nxc{t) + mydt)) - 1' 

Consider that all patients in Example 4 have exponential survival time of 
various intensity (a, b and 1 respectively), the law of large number guarantees 


n + m 


—)■ pe~ 


^ n + m 




n + m 


Hence the limit of Aq, the estimated hazard for a control group patient living 
within the inhnitestimal time interval {t, t + dt) is 


^ ^ pe + (1 — p)e “6 + e * 

0 + (1 — p)e“^*)cp^ + 

This is formula ([IT|. 


5.2 Inequalities for different versions of the combined 
treatment effect 

We have investigated various dehnitions of the overall treatment effect for a 
mixed patient population. Here we are going to demonstrate the quantita¬ 
tive relationship between these dehnitions assuming the simplest modelling 
setup as was described in Example 1. That is, the treatment group indicator 
Z ~ Bernoulli{q) is the only covariate for the proportional hazard models. 
The survival times of the enrolled patients follow proportional hazard models 
with hazard ratios a = e“ and b = respectively for trial 1 and 2. The ratio 
of the sample sizes of the two trials always equals p : (1 — p). We have the 
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following candidates for the definition of the overall (log) hazard ratio: 

i) 6L=pa + (1 -p)/3, 

ii) Cl = pa + {1 - p)b, 


111 


) c*p]^ as the solution to equation 


iv) the harmonic mean of a and b as was dehned in (16). 


Proposition 3. For arbitrary a < 6, g G (0,1) and p G (0,1), the following 
inequalities always holds true: 

1) a < < exp{6L) < cl < b; 

2) a < c*pp < Cl < b. 


Proof. The harmonic mean > a is trivial. 

To compare ^*HM with exp(6'L), consider their ratio 

exp(6'i:,) /ON'",. . /a\p 


^HM 




Denote the above ratio by i?(a, b) and calculate 
dR{a, b) 


db 


= p(l—(1/a — 1/6) > 0, Va < 6. 


Combined with the fact that i?(a, 6 = a) = 1, it indicates i?(a, 6) > 1 for any 
b > a therefore we proved 


Chm < exp(0L). 

Using the Jensen’s inequality with the convex function g{x) = e^, it is easy 
to prove that 

exp(0i) = < pe“ + (1 - p)e^ = cl- 

It is also trivial to see that the algebraic mean cl < b. 


For simplicity, denote the integrand in ([^ by f{u, a, b, c). To compare a and 
c*pp, note that 


f{u, a,b,c = a) 


(1 -q)e ^ + pqae + (1 - p)qbe ^ 
(1 — g)e““ + pgae“““ + (1 — p)gae“^“ 
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Hence f{u, a, b,a)du > 1. To achieve an equality in ([^, one must have 
the solution > a because f{u, a, b, c) is decreasing with respect to c. 

To compare c*pj^ with c^, observe that 

[pgae“““ + (1 - p)qbe~'^] - \pqcLe~'"'^ + (1 - p)q'CLe“'’“] 

= p(l-p)g(a-6)(e-““-e-'’“) < 0 

Therefore f{u,a,b,CL) < e““ and equivalently f{u, a, b,c*pp)du < 1. To 
achieve an equality in ([^, one must have c*pp < cl- □ 

One may also be tempted to hnd a hxed order for vs c*pp and c*pp vs 
exp(6'i). However, close examination of the algebraic dehnitions implies that 
the result of these comparisons depend on the values of the parameters. The 
rule of thumb is, < c*pp < exp(6*L) for most a < 6 < 1 of practical 
importance. Only when a is extremely small (typically smaller than 0.2) one 
can observe exp(6'i) < c*pp. When 1 < a < 6, c*pp and are pretty close 
to each other and Proposition 3 indicates that ^*HM < exp(6*i). Here is an 
example demonstrating the relationships of these estimators. 

Example 6. Let p = 0.5, q = 0.5. We plotted the percentage differences 
between the three estimators in Figure 2. The second estimator in the list is 
always the basis for comparison. 

Note that when a assumes a decent value, e.g., any number greater than 0.5, 
the linear log hazard ratio estimate 6^ always leads to a conservative deh- 
nition for the overall treatment effect compared the limit of the MPLE 
or the limit of the MLE ^HM- The bias in exp(6*/,) can be higher than 38% 
depending on the distance between a and b. Table 1 are the exact values of 
the overall hazard ratios dehned for various combinations of a, b: 
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0.5 

1.0 

1.5 

2.0 

2.5 
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